Mod-CSA: Modularity optimization by conformational space annealing 
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We propose a new modularity optimization method, Mod-CSA, based on stochastic global op- 
timization algorithm, conformational space annealing (CSA). Our method outperforms simulated 
annealing in terms of both efficiency and accuracy, finding higher modularity partitions with less 
computational resources required. The high modularity values found by our method are higher 
than, or equal to, the largest values previously reported. In addition, the method can be combined 
with other heuristic methods, and implemented in parallel fashion, allowing it to be applicable to 
large graphs with more than 10000 nodes. 



I. INTRODUCTION 

Network science has emerged as an important frame- 
work to study complex systems [1, 2]. One of the most 
important properties of networks is the existence of mod- 
ules/communities; communities are subgraphs of densely 
inter-connected nodes, and nodes in a community are 
considered to share common characteristics [3, 4]. Proper 
community detection allows one to determine potentially 
hidden relationships between nodes, and also allows one 
to reduce a large complex network into smaller and com- 
prehensible ones. For this reason, good community detec- 
tion within networks has been a subject of great interest. 
There exist various definitions of community [4-7]. The 
most widely used approach to detect such sub-groups of 
nodes with non-random connections involves the use of 
modularity to quantify the quality of a given partition 
of a network [4, 8, 9]. Using modularity, the community 
detection problem is thus recast as a global optimization 
problem. However, finding the maximum modularity so- 
lution is an NP-hard problem [10], and enumeration of 
all possible partitions is intractable in general. Therefore, 
an efficient optimization algorithm is required to obtain 
high modularity solutions. 

Most of the modularity optimization studies have fo- 
cused on developing fast heuristic methods generating 
reasonable quality community structures. Currently, 
simulated annealing (SA) is considered to be the best 
algorithm [4, 11] and has been adopted in many theoret- 
ical and practical studies where communities with high 
modularity are required [12-14]. 

In this paper, we propose a new modularity maximiza- 
tion method based on conformational space annealing 
(CSA) algorithm [15-19]. We show that CSA outper- 
forms SA both in generating better community structures 
and in computational efficiency. CSA consistently finds 
community structures with higher modularity using less 
computational resources. Moreover, for networks con- 
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taining approximately up to 1000 nodes, CSA repeatedly 
finds converged solutions. Considering the stochastic na- 
ture of the algorithm, this suggests that the converged 
solution is likely to be the optimal solution of the net- 
work. 



II. METHODS 

Let us consider a network with N nodes and M edges. 
Modularity measures the fraction of intra-community 
edges minus its expected value from the null model, a 
randomly rewired network with the same degree assign- 
ments. Modularity is defined as 




where N c is the number of assigned communities, li is 
the number of edges within the community i and Di is 
the sum of degrees of nodes in the community i. 

To benchmark the performance of CSA against that 
of SA, we implemented SA following existing stud- 
ies [12, 20]. Initially, using E = —Q, a simulation starts 
at a high temperature T, to sample broad range of the so- 
lution space as well as to avoid trapping in local-minima. 
As the simulation proceeds, T is slowly decreased to more 
completely explore basins of high modularity. At a given 
T, a set of stochastic movements including N 2 single- 
node moves and TV collective moves consisting of random 
merges and splits of communities, arc carried out. To 
split a community, a 'nested' SA method is used [12, 20], 
which isolates a target community from the entire net- 
work and split it into two communities. Each 'nested' 
SA starts with two randomly separated groups for short 
annealing and the annealed solution serves as a collec- 
tive move. For each trial movement, if Q increases, the 
movement is accepted, otherwise it is accepted with prob- 
ability P = exp ( ^ i T < ^' ^ . After a set of movements are 

tried, T is decreased to aT, where a = 0.995. 

Our method, CSA, is a global optimization method 
which combines essential ingredients of three methods: 
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Monte Carlo with minimization (MCM) [21], genetic al- 
gorithm (GA) [22], and SA [23]. As in MCM, we consider 
only the phase/conformational space of local minima; 
i.e., all solutions are minimized by a local minimizer. As 
in GA, we consider many solutions (called bank in CSA) 
collectively, and we perturb a subset of bank solutions 
by cross-over between solutions and mutation. Finally, 
as in SA, we introduce a parameter D cut , which plays 
the role of the temperature in SA. In CSA, each solution 
is assumed to represent a hyper-sphere of radius D in 
the solution space. Diversity of sampling is directly con- 
trolled by introducing a distance measure between two 
solutions and comparing it with D cut , to deter two solu- 
tions from coming too close to each other. Similar to the 
reduction of T in SA, the value of D cut is slowly reduced 
in CSA; hence the name conformational space annealing. 

To apply CSA to optimize modularity, three ingredi- 
ents are required: (a) we need a local modularity max- 
imizcr for a given network partition, (b) we need a dis- 
tance measure between two Q-maximized network parti- 
tions, and (c) we need ways to combine two parent par- 
titions to generate a daughter partition which will be 
Q-maximizcd subsequently 

Here, the community structure is represented by as- 
signing an index to each node, where nodes with an iden- 
tical index belong to the same community. For local max- 
imization of Q, we use a quench procedure which accepts 
a move if and only if it improves Q, equivalent to SA at 
T = 0. 

The distance between two community structures is 
measured by the variation of information (VI) [24] . For 
two given partitions of X and Y, VI is defined as 

VI(X,Y) = H(X,Y) - I(X;Y) 

where H is the entropy function and I is the mutual 



information function of the probability p(x, y) 



where n is the number of total nodes, xjy refers to a 
community from X/Y, and n SiS is the number of nodes 
shared by x and y. With H and / defined by 

H(X,Y) = -^p(x,y)logp(x,y) 
x,y 
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VI can be reduced to 
VI(X,Y) = - 



I^n^logf%V (2) 



where p(x) — n x /n and n x is the number of nodes in 
community x. If A is identical to Y , it is easy to show 



that VI(X, Y) = 0. We have also tried other measures 
such as Rand index [25] and normalized mutual informa- 
tion (NMI) [26] and they gave no significant difference in 
results. 

In CSA, we first generate 50 random partitions which 
are subsequently maximized by quench procedures. We 
call these solutions as the first bank which is kept un- 
changed during the optimization. We make a copy of the 
first bank, and call it the bank. The partitions in the 
bank are updated by better solutions found during the 
course of optimization. The initial value of D cut is set 
as D avg /2, where D avg is the average distance between 
partitions in the first bank. A number of partitions (30 
in this study) in the bank are selected as seed partitions. 
With each seed, 20 trial partitions arc generated by cross- 
over between the seed and a randomly chosen partition 
from either the bank or the first bank. An additional 5 
are generated by random mutation of the seed. 

For a cross-over, we use two operations, a convergent 
copy and a divisive copy, shown in Figure 1. In both 
cases, one community represented by an index is ran- 
domly selected from a source solution and it is copied 
into a target solution after assigning a new index. For 
the convergent copy, the new index is chosen from one 
of the neighboring indices of the copied nodes from the 
target in a random fashion. For the divisive copy, a new 
index not present in the target is chosen. The rationale of 
using these operators is that the community index itself 
has no particular meaning, while a well-defined commu- 
nity structure from one solution can serve as an advanta- 
geous feature that should be preserved to generate a bet- 
ter solution. For each operation, the minimum number 
of nodes that should be copied are randomly determined 
between 1% to 40% of total nodes and the above oper- 
ation is repeated until the total number of copied nodes 
exceeds the number. 

For mutation, random merge and split operators were 
introduced. The random merge was carried out by com- 
bining two adjacent communities. The random split op- 
erator divides a community into two randomly assigned 
groups. All trial conformations generated by cross-over 
and mutation operations are optimized by quench proce- 
dures. It should be noted that only local moves are used 
in the quench procedures since the divergent and divisive 
copy operators can act as the merge and split moves used 
in SA. 

After local-maximization of trial partitions, these par- 
titions arc used to update the bank. The modularity of 
a trial partition a is compared with the modularities of 
partitions in the bank. If a is worse than the worst par- 
tition of the bank, it is discarded. Otherwise, we find 
the partition A in the bank which is closest to a, as de- 
termined by distance D(a,A). If D(a, A) < D cut , a is 
considered as similar to A and it replaces A if a > A. 
If a < A it is discarded. If D(a,A) > D cut , a is re- 
garded as a new partition similar to none in the bank, 
and it replaces the worst existing partition, that is, it re- 
places the lowest modularity partition in the bank. We 
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FIG. 1. (Color online) Two cross-over operators, (A) con- 
vergent copy and (B) divisive copy are shown. In (A) the 
community indexed by 1 from the source is copied into the 
target, and the new indices are set to 1 or 3 with the prob- 
ability of 2/3 and 1/3. In (B) the community indexed by 3 
from the source is copied into the target and the new index 2 
is assigned. 

carry out this operation for all trial partitions. With up- 
dated bank, new seed partitions are selected again from 
the bank which have not yet been used as seeds. This 
entire process of generating partitions by perturbation 
and subsequent local maximization and updating bank 
is repeated until all partitions in the bank are used as 
seeds. At each iteration step, D cut is reduced with a pre- 
determined ratio. After D cut reaches to its final value, 
Davg/5, it is kept constant. 

Once all partitions in the bank are used as seeds with- 
out generating better partitions, implying that the pro- 
cedure might have reached a deadlock, we reset all bank 
partitions to be eligible for seeds and repeat another 
round of search procedure. After this additional search 
also reaches a deadlock, we expand our search space by 
adding an additional 50 randomly generated and opti- 
mized partitions to the bank, and repeat the whole pro- 
cedure. In this study, we terminated our calculation after 
100 partitions were used as seeds. Additional adding cy- 
cles should be considered for more rigorous optimization, 
especially for problems with high complexity. 



III. RESULTS 



To compare performance of CSA and SA, we applied 
CSA and SA to a number of real-world networks com- 
monly used in existing modularity optimization studies, 
shown in Table I. All networks considered are undi- 
rected and unweighted. Due to the stochastic nature 
of both methods, we performed 50 independent simu- 
lations for each method. The results are summarized 



Network 


Nodes 


Edges 


Dolphins 


62 


159 


Les Miserables 


77 


254 


Political books 


105 


441 


College football 


115 


613 


Jazz 


198 


2742 


USAir97 


332 


2126 


Netscience_main 


379 


914 


C. elegans 


453 


2025 


Electronic Circuit (s838) 


512 


819 


E-mail 


1133 


5451 


Erdos02 


6927 


11850 


PGP 


10680 


24316 


condmat2003 


27519 


116181 



TABLE I. Number of nodes and edges of benchmark networks 
used in this study are displayed. 

in Table II. The maximum, average and standard de- 
viation of modularity values obtained by both methods 
are displayed. As a measure of required computational 
resources, we counted the number of function evalua- 
tions performed until the maximum modularity solution 
is found, N max . We observe that CSA consistently finds 
equal or higher modularity solutions than does SA for all 
networks tested, with a smaller number of function evalu- 
ations. To demonstrate the search efficiency of CSA more 
clearly, we also measured the number of function evalua- 
tions required by CSA to generate a solution equivalent 
to the best modularity obtained by SA, which is denoted 
as N^?g^ 1 in Table II. CSA clearly requires many fewer 
function evaluations to generate solutions better than the 
best ones obtained by SA. For small networks (e.g. up to 
the Jazz musician network), CSA finds the best solution 
with less than 10% of the function evaluations required by 
SA and for the worst case, the C. elegans network, CSA 
requires only 25% of the function evaluations of SA. 



It should be noted that CSA can be applied to networks 
containing more than 10 3 nodes where for SA this is im- 
practical. For the three largest networks in Table II, CSA 
found good solutions within a reasonable computational 
time whereas SA runs did not yield reasonable value of 
modularity within 2 days of wall clock time. This differ- 
ence in computational time reflects a number of factors. 
It is partly due to the high parallel efficiency of the CSA 
algorithm [35]. In SA, generation of a trial solution is 
dependent on its previous state, which makes it imprac- 
tical to implement the algorithm in a parallel fashion. 
However, the majority of computation in CSA consists of 
independent local maximization procedures on hundreds 
of trial solutions generated by cross-over and mutation, 
and each maximization can be separately carried out. 
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0.0036 
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0.74 
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.56001 





,56001 





0. 


56001 





,55194 


0.0071 


0.362 


0.362 


0.01 


0.18 


Political books 





.52724 





52724 





0. 


52724 





,52723 





0.055 


0.019 


0.07 


2.52 


College football 





.60457 





,60457 





0. 


,60457 





,60457 





0.093 


0.093 


0.05 


0.26 


Jazz 





.44514 





,44514 








,44487 





,44477 


1.6e-4 


0.073 


0.052 


0.17 


679.4 


USAir97 





.36824 





,36824 








35376 





,34787 


0.0044 


0.271 


0.010 


0.13 


429.2 


Netscience_main 





.84859 





,84859 





0. 


,84383 





,83544 


0.0044 


0.345 


0.019 


1.3 


263.3 


C. elegans 





.45325 


0. 


,45325 





0. 


,45212 





,44927 


0.0026 


0.960 


0.246 


16.8 


2512.3 


Electronic Circuit (s838) 





.81936 





81936 








,81871 





,80812 


0.0048 


0.639 


0.424 
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.58283 





,58282 


2.2e-5 





,58198 





,58015 


0.0015 


0.510 


0.119 


73.6 


42296 
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.71843 





,71782 


3.2e-4 
















3356 
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.88675 





,88648 


l.le-4 
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.76745 





,76484 


0.0010 
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TABLE II. Modularity optimization results obtained by 50 seperate runs of Mod-CSA and SA are displayed. Q ma x denotes 
the maximum modularity value, Qavg the average of maximum modularity value of each run, a the standard deviation of the 
modularity value, J\[ max the number of function evaluations until the calculation reached the maximum modularity, Nq 1 ^ 
the number of function evaluations required for CSA runs to sample equal or higher modularity solutions than the maximum 
modularity of SA runs, t the average execution time to find the best solution in seconds. All simulations were performed on 
Intel Nehalem core @ 2.93GHz. CSA and SA runs were performed with 64 cpus and single cpu, respectively. For the three 
largest networks, SA results are not available because calculations were not finished within 2 days. 
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0.3682 
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Netscience_main 
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0.45325 
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[29] 


E-mail 


10 


0.58283 


0.582 








[34] 


Erdos02 


40 
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0.7162 








[30] 


PGP 


100 


0.88674 


0.8841 








[30, 34] 


condmat2003 


80 


0.76745 


0.761 








[31] 



TABLE III. Comparison between the maximum modularity values obtained by CSA, Q ma x, with previously published ones, 
Qpub, and the maximum values obtained by the exact method [29], Q op t, is displayed. N c denotes the number of communities 
found by CSA. Source indicates the reference that the modularity value is collected. %op\ denotes the percentage of SA runs 
that reached to the optimal modularity community structure. 



The quench procedure in CSA consists of local moves 
only, which is rather fast with large networks. On the 
other hand, the most time-consuming operation in SA is 
the splitting move by the nested SA procedure which we 
find is indeed essential to obtain good SA solutions. In 
CSA, the operation of the divisive copy when generating 
trial solutions plays the equivalent role of the split move 
in SA. To compare the computing efficiency of CSA with 
existing methods, the time complexities of CSA and SA 
arc estimated based on the simulation results with the 



benchmark networks. As shown in Figure 2, the time 
complexity of CSA is estimated to be 0(n 2 ' 6 ) which is 
comparable to other heuristic methods [36, 37] and bet- 
ter than that of SA, 0(n 4 3 ), where n is the number of 
nodes. 

In terms of convergence, CSA yields more robust so- 
lutions than SA. Except the political books and college 
football networks, the maximum modularity solution 
found by SA varied from simulation to simulation. 
For networks containing over 300 nodes, SA failed 
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Network size (n) 



optimal modularity solutions and the method is not free 
from the problem of the resolution limit arising from 
using modularity [38]. However, the CSA procedure and 
operators proposed in this work are general, and can be 
used to optimize other fitness functions. To overcome 
the resolution limit issue, more robust fitness functions 
should be considered to be combined with CSA, such as 
the map equation [39] or the partition density [7]. It 
should be noted that the current work can be extended 
to deal with directed or weighted networks in conjunc- 
tion with modified modularity functions [40, 41]. In 
order to handle large networks, CSA can be combined 
with other efficient heuristics, such as the fast unfolding 
method [42], instead of the stochastic quench procedure 
used in this study. 



FIG. 2. (Color online) Comparison of time complexities of 
CSA (red, circle) and SA (blue, cross) is shown. Network size, 
n, represents the number of nodes, and CPU time corresponds 
to the average time to find final solutions in seconds. 

to sample the optimal solution, which raises serious 
concerns when applying SA to modularity optimization 
for practical use [11]. However, for all test networks up 
to about 10 3 nodes, all CSA runs converged to the same 
solution, except the E-mail network where 41 out of 50 
converged. Considering the small size of networks and 
the stochastic nature of the algorithm, we believe that 
the converged solution of each network is likely to be the 
true maximum modularity of the network. 

We also compared maximum modularities obtained 
by CSA with the maximum values from previous pub- 
lications; see Table III. CSA finds equivalent or higher 
Q values compared to existing studies in all networks 
tested. 

Recently, the exact maximum modularity values of 
several small benchmark networks up to 512 nodes were 
reported; they are displayed in Table III as Q op t [29]. 
We performed 50 independent runs for these networks 
and all runs converged to the optimal solutions without 
exception. This result supports the hypothesis that 
CSA is efficient enough to find the putative maximum 
modularity solution for a network containing up to 10 3 
nodes. 

CSA algorithm presented in this work aims to obtain 



IV. CONCLUSION 

In this paper, we propose a new modularity optimiza- 
tion method based on conformational space annealing 
algorithm, Mod-CSA. Compared to SA, our method is 
faster. Further, while it finds equivalent modularity 
partitions for relatively small networks, for the larger 
more challenging ones, it typically finds higher modu- 
larity partitions. For small networks consisting up to 10 3 
nodes, despite its stochastic nature, Mod-CSA solutions 
converge to an identical solution, which appears to be 
the best solution possible; this is not possible in other 
stochastic algorithms. Mod-CSA can be implemented in 
a highly parallel fashion and is thus applicable to large 
networks where SA is not. In addition, Mod-CSA can 
be extended to deal with large networks by using fast 
heuristic methods as a local optimizer. 
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